Flow Matching
Flow Matching is a simulation-free approach to train continuous normalizing flows by directly regressing on a vector field that generates a desired probability path. Unlike [[Diffusion Model|diffusion models]] that rely on [[Stochastic Differential Equation (SDE)|SDE]] theory, Flow Matching provides a simpler and more flexible framework for generative modeling using ordinary differential equations.
1. Core Concept
1.1 Motivation
Problems with existing methods:
-
[[Diffusion Model|Diffusion Models]]:
- Require complex [[Stochastic Differential Equation (SDE)|SDE]]/ODE theory
- Need to solve Fokker-Planck equation
- Constrained by specific noise schedules
-
[[Continuous Normalizing Flow]]:
- Likelihood computation is expensive
- Training requires simulating ODE trajectories
- Limited architectural choices
-
Score-Based Models:
- Require score matching objectives
- Complex mathematical derivation
Flow Matching solves these by:
- Direct vector field regression (no simulation needed)
- Flexible conditional flow design
- Simpler mathematical foundation
- Connections to optimal transport
1.2 Key Idea
Instead of deriving the ODE from a stochastic process, Flow Matching directly learns a velocity field that transports samples from a simple distribution (e.g., Gaussian) to the data distribution.
where
[!NOTE] Core Insight
Flow Matching bypasses the need for [[Stochastic Differential Equation (SDE)|SDE]] theory and score matching by directly learning the velocity field that generates the desired probability flow, making it conceptually simpler and more flexible.
2. Mathematical Foundation
2.1 Continuous Normalizing Flows
A continuous normalizing flow (CNF) is defined by an ODE:
where:
-
: Simple prior distribution (e.g., ) -
: Target data distribution -
: Time-dependent velocity field
Probability path
2.2 Flow Matching Objective
Goal: Learn
Flow Matching Loss:
where
2.3 The Challenge
The problem:
Solution: Use Conditional Flow Matching (CFM).
3. Conditional Flow Matching (CFM)
3.1 Key Insight
Instead of matching the marginal vector field
Conditional Flow Matching Loss:
Theorem: Under certain conditions,
3.2 Conditional Probability Path
Given data point
where:
-
: Time-dependent mean -
: Time-dependent standard deviation
Boundary conditions:
-
: (prior) -
: (data point)
3.3 Conditional Vector Field
For the Gaussian conditional path, the conditional vector field is:
Simplified form:
where
4. Common Flow Designs
4.1 Optimal Transport Flow
Idea: Transport points along straight lines from noise to data.
Conditional path:
Conditional vector field:
Advantages:
- Straight trajectories (easy to integrate)
- Minimal transport cost (optimal transport)
- Fast sampling (few ODE steps needed)
4.2 Gaussian Conditional Flow
General form:
Conditional vector field:
where
4.3 Variance Exploding Flow
Similar to VE-[[Stochastic Differential Equation (SDE)|SDE]] in diffusion models:
Conditional vector field:
4.4 Comparison of Flow Designs
| Flow Type | Trajectory | Transport Cost | Sampling Speed | Stability |
|---|---|---|---|---|
| OT Flow | Straight | Minimal | Very Fast | Good |
| Gaussian | Curved | Moderate | Fast | Very Good |
| VE Flow | Curved | Higher | Medium | Good |
| VP Flow | Curved | Moderate | Fast | Very Good |
5. Training Algorithm
5.1 Flow Matching Training
1 | # Conditional Flow Matching Training |
5.2 Complete Training Loop
1 | for epoch in range(num_epochs): |
5.3 Key Differences from Diffusion Models
| Aspect | Diffusion Models | Flow Matching |
|---|---|---|
| Objective | Score matching / ELBO | Vector field regression |
| Mathematical foundation | [[Stochastic Differential Equation (SDE)|SDE]] theory | ODE theory |
| Target | Score function
|
Velocity field
|
| Training | Denoising objective | Direct regression |
| Flexibility | Constrained by [[Stochastic Differential Equation (SDE)|SDE]] | Arbitrary flow design |
| Likelihood | Tractable via ODE | Tractable via ODE |
6. Sampling Algorithm
6.1 ODE Integration
1 | # Flow Matching Sampling |
6.2 Euler Method (Simple)
1 | def euler_sample(model, x_0, num_steps=100): |
6.3 Advanced ODE Solvers
| Solver | Order | Steps Needed | Characteristics |
|---|---|---|---|
| Euler | 1st | 100-200 | Simple, slow |
| RK4 | 4th | 50-100 | Accurate, moderate |
| DOPRI5 | Adaptive | 20-50 | Automatic step size |
| [[DPM-Solver]] | Specialized | 10-20 | Fast for diffusion |
7. Theoretical Analysis
7.1 Equivalence to Score Matching
Theorem: Under certain conditions, Flow Matching is equivalent to score matching.
For the conditional flow:
This shows the connection between velocity fields and score functions.
7.2 Likelihood Computation
Using the instantaneous change of variables formula:
Integrating from
Divergence computation:
- Exact:
for -dimensional data - Hutchinson’s estimator:
(stochastic)
7.3 Optimal Transport Connection
Benamou-Brenier Formula:
The Wasserstein-2 distance between
subject to the continuity equation.
OT Flow minimizes this transport cost, leading to straight trajectories.
7.4 Rectified Flows
Key Idea: Iteratively straighten the flow trajectories.
Algorithm:
- Train initial Flow Matching model
- Generate samples
pairs - Retrain model with straight-line interpolation:
- Repeat 2-3 times
Result: Nearly straight trajectories, enabling 1-step generation.
8. Advanced Variants
8.1 Rectified Flow
Motivation: Straight trajectories are easier to integrate.
Method:
- Learn residual velocity:
- Iteratively “rectify” the flow
- Achieve 1-2 step generation with high quality
Loss:
8.2 Flow Matching with Prior Blending
Idea: Use learned prior instead of fixed Gaussian.
Benefits:
- Lower transport cost
- Faster convergence
- Better sample quality
8.3 Multimodal Flow Matching
Challenge: Standard flows are deterministic mappings (bijective).
Solution: Use mixture of flows or stochastic interpolation.
8.4 Comparison Table
| Variant | Trajectory | Steps | Quality | Training Cost |
|---|---|---|---|---|
| Standard FM | Curved | 50-100 | High | Low |
| Rectified Flow | Straight | 1-10 | Very High | Medium |
| Prior Blending | Curved | 30-50 | Very High | Medium |
| Multimodal FM | Curved | 50-100 | High | High |
9. Applications
9.1 Text-to-Image Generation
Stable Diffusion + Flow Matching:
- Replace diffusion ODE with Flow Matching
- Faster training (no score matching)
- Flexible flow design
- Comparable or better FID scores
Example: SD3 (Stable Diffusion 3) uses rectified flows.
9.2 Molecular Generation
Advantages:
- Continuous representation of molecules
- Exact likelihood computation
- Flexible prior design
- Fast sampling
9.3 Audio Synthesis
Benefits:
- High-fidelity audio generation
- Faster than diffusion models
- Controllable generation via conditioning
9.4 Video Generation
Temporal Flow Matching:
- Model spatiotemporal dynamics
- Straight trajectories reduce artifacts
- Efficient sampling for long sequences
9.5 3D Generation
Point Cloud / Mesh Generation:
- Continuous 3D structure modeling
- Optimal transport preserves geometry
- Fast generation for interactive applications
10. Practical Implementation
10.1 Network Architecture
Common choices:
-
U-Net (from diffusion models):
- Proven architecture
- Multi-scale processing
- Attention mechanisms
-
Transformer:
- Global receptive field
- Scalable to high dimensions
- Good for sequential data
-
MLP (for low-dimensional data):
- Simple and efficient
- Good for toy examples
Time embedding:
1 | class SinusoidalTimeEmbedding(nn.Module): |
10.2 Training Best Practices
1. Time sampling:
- Uniform:
- Importance sampling: More weight on difficult regions
2. Data normalization:
- Normalize data to
or - Ensure numerical stability
3. Learning rate scheduling:
- Warmup: Gradually increase LR
- Cosine decay: Smooth decrease
4. Batch size:
- Larger batches = more stable gradients
- Typical: 64-256
10.3 Debugging Checklist
- [ ] Verify boundary conditions:
, data - [ ] Check trajectory continuity (no jumps)
- [ ] Monitor loss convergence
- [ ] Test ODE solver with different step sizes
- [ ] Validate likelihood computation
- [ ] Compare sample quality with baseline
11. Comparison with Other Methods
11.1 Flow Matching vs [[Diffusion Model|Diffusion Models]]
| Aspect | Flow Matching | Diffusion Models |
|---|---|---|
| Foundation | ODE theory | [[Stochastic Differential Equation (SDE)|SDE]] theory |
| Objective | Vector field regression | Score matching / ELBO |
| Flexibility | High (arbitrary flows) | Constrained (noise schedule) |
| Training | Simple regression | Complex derivation |
| Sampling | ODE integration | [[Stochastic Differential Equation (SDE)|SDE]]/ODE integration |
| Likelihood | Exact | Exact (via ODE) |
| Theory | Simpler | More complex |
11.2 Flow Matching vs [[Continuous Normalizing Flow]]
| Aspect | Flow Matching | Traditional CNF |
|---|---|---|
| Training | Simulation-free | Requires ODE simulation |
| Speed | Fast | Slow (backprop through ODE) |
| Architecture | Flexible | Constrained (trace computation) |
| Scalability | High | Limited |
11.3 Flow Matching vs GAN
| Aspect | Flow Matching | GAN |
|---|---|---|
| Training stability | Stable (MSE loss) | Unstable (minimax game) |
| Mode coverage | Complete | Mode collapse possible |
| Likelihood | Exact | Intractable |
| Sample quality | High | High |
| Sampling speed | Medium (ODE steps) | Fast (1 step) |
11.4 Generative Model Comparison
| Model | Training | Sampling | Likelihood | Stability | Quality |
|---|---|---|---|---|---|
| GAN | Adversarial | 1 step | Intractable | Unstable | High |
| VAE | ELBO | 1 step | Lower bound | Stable | Medium |
| Normalizing Flow | Likelihood | Parallel | Exact | Stable | Medium-High |
| Diffusion | Score matching | 50-1000 steps | Exact | Stable | Very High |
| Flow Matching | Vector regression | 10-100 steps | Exact | Stable | Very High |
12. Core Formula Cards
[!QUOTE] Flow Matching Objective
[!QUOTE] Conditional Flow Matching
[!QUOTE] Optimal Transport Flow
[!QUOTE] Gaussian Conditional Flow
[!QUOTE] Continuity Equation
[!QUOTE] Likelihood Computation
13. Recent Advances (2023-2024)
13.1 Rectified Flow
Key Paper: Building Normalizing Flows with Stochastic Interpolants (Albergo et al., 2023)
Contributions:
- Iterative straightening of trajectories
- 1-2 step generation with high quality
- Connections to optimal transport
13.2 Flow Matching for Large-Scale Generation
SD3 (Stable Diffusion 3):
- Uses rectified flows instead of diffusion
- Better sample quality
- Faster training and sampling
- Multimodal conditioning
13.3 Flow Matching + Consistency Models
Idea: Combine Flow Matching with consistency models for 1-step generation.
Method:
- Train Flow Matching model
- Distill to consistency model
- Achieve 1-step generation
13.4 Multimodal Flow Matching
Challenge: Standard flows are deterministic.
Solutions:
- Mixture of flows
- Stochastic interpolation
- Latent variable models
Related Concepts
- [[Diffusion Model]]
- [[Continuous Normalizing Flow]]
- [[Probability Flow ODE]]
- [[Stochastic Differential Equation (SDE)]]
- [[Fokker-Planck Equation]]
- [[Optimal Transport]]
- [[Rectified Flows]]
- [[Score Function]]
- [[Neural ODE]]
- [[DPM-Solver]]
- [[DDIM]]
- [[Wiener Process|Wiener Process]]
- [[Markov Process]]
- [[U-Net]]
- [[Generative Adversarial Network (GAN)]]
Dataview Query
1 | LIST |
References
- Paper: Flow Matching for Generative Modeling (Lipman et al., 2023)
- Paper: Building Normalizing Flows with Stochastic Interpolants (Albergo et al., 2023)
- Paper: Rectified Flow (Liu et al., 2022)
- Paper: Action Matching: Learning Stochastic Dynamics From Samples (Neklyudov et al., 2022)
- Paper: SE(3)-Stochastic Flow Matching for Protein Backbone Generation (Bose et al., 2023)
- Blog: Flow Matching: A New Paradigm for Generative Modeling - Lilian Weng
- Course: CS236 Deep Generative Models (Stanford)
- GitHub: https://github.com/atong01/conditional-flow-matching